Naive Bayes classifiers (Two Classes)

Approaches

Approach 1: How to use LabelEncoder​
To graph the features and classes, they should be converted to a number. Use the LabelEncoder method to convert the labels into numbers. The LabelEncoder works separately for each feature so at the end, it is necessary to zip the features into a single list. Although it is possible to use the LabelEncoder for features, it is best practice to use the OrdinalEncoder which we will cover in Approach 2. More information is available in the SciKit Learn User Guide https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features.
 
# Import LabelEncoderfrom sklearn import preprocessing# creating LabelEncoderle = preprocessing.LabelEncoder() # Converting string feature labels into numbers.weather_encoded=le.fit_transform(weather)temp_encoded=le.fit_transform(temp)print("Weather:",weather_encoded)print("Temp:",temp_encoded)​Weather: [2 2 0 1 1 1 0 2 2 1 2 0 0 1]Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2] 
The class labels also need to be encoded.
​
# Converting string class labels into numberslabel=le.fit_transform(play)print("Play:",label)​Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]​
The LabelEncoder method encodes features individually, so they need to be combined afterward. This issue does not happen when the OrdinalEncoder method is used we will take a look at the Ordinal Encoder later in this notebook.
​
# Combining weather and temp into single list of tuplesfeatures=list(zip(weather_encoded,temp_encoded))print(features) [(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]​
Next, train the model based on the dataset and then return the prediction based on a new value: overcast weather and mild temperatures.
 
TASK: Try changing the variables to predict if the match would take place if it was overcast and hot.
What about sunny and hot?
​
# Import Categorical Naive Bayes modelfrom sklearn.naive_bayes import CategoricalNB # Create a Categorical Classifiermodel = CategoricalNB() # Train the model using the training setsmodel.fit (features,label) # Predict Outputpredicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild # Predict probabilitypredict_probability = model.predict_proba([[0,2]])print("Predicted Value:", le.inverse_transform(predicted), " with ", predict_probability)​Predicted Value: ['Yes'] with [[0.13043478 0.86956522]]​
Approach 2: We should use OrdinalEncoder for features​
When the dataset has more than one feature, it is best to first combine the features into a single list and then encode them using the OrdinalEncoder method. More information is available in the SciKit Learn User Guide https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features.
​
# Get dataset with string featurestraining_set=list(zip(weather, temp))print(training_set) [('Sunny', 'Hot'), ('Sunny', 'Hot'), ('Overcast', 'Hot'), ('Rainy', 'Mild'), ('Rainy', 'Cool'), ('Rainy', 'Cool'), ('Overcast', 'Cool'), ('Sunny', 'Mild'), ('Sunny', 'Cool'), ('Rainy', 'Mild'), ('Sunny', 'Mild'), ('Overcast', 'Mild'), ('Overcast', 'Hot'), ('Rainy', 'Mild')]​# Create Ordinal Encoderenc = preprocessing.OrdinalEncoder() encoded_training_set = enc.fit_transform(training_set)print(encoded_training_set)​[[2. 1.] [2. 1.] [0. 1.] [1. 2.] [1. 0.] [1. 0.] [0. 0.] [2. 2.] [2. 0.] [1. 2.] [2. 2.] [0. 2.] [0. 1.] [1. 2.]]​target_set = label​
We can see that the trained model returns the same result.
​
# Train model againmodel2 = CategoricalNB()model2.fit(encoded_training_set, target_set) # Predict Outputpredicted2= model2.predict([[0,2]]) # 0:Overcast, 2:Mildpredict_probability2 = model.predict_proba([[0,2]]) print("Predicted Value:", le.inverse_transform(predicted2), " with ", predict_probability2)​Predicted Value: ['Yes'] with [[0.13043478 0.86956522]] 

Naive Bayes classifiers (Two Classes)

Approaches

Approach 1: How to use LabelEncoder

# Import LabelEncoder

from sklearn import preprocessing

# creating LabelEncoder

le = preprocessing.LabelEncoder()

# Converting string feature labels into numbers.

weather_encoded=le.fit_transform(weather)

temp_encoded=le.fit_transform(temp)

print("Weather:",weather_encoded)

print("Temp:",temp_encoded)

​

Weather: [2 2 0 1 1 1 0 2 2 1 2 0 0 1]

Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]

# Converting string class labels into numbers

label=le.fit_transform(play)

print("Play:",label)

​

Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]

# Combining weather and temp into single list of tuples

features=list(zip(weather_encoded,temp_encoded))

print(features)

[(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]

# Import Categorical Naive Bayes model

from sklearn.naive_bayes import CategoricalNB

# Create a Categorical Classifier

model = CategoricalNB()

# Train the model using the training sets

model.fit (features,label)

# Predict Output

predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild

# Predict probability

predict_probability = model.predict_proba([[0,2]])

print("Predicted Value:", le.inverse_transform(predicted), " with ", predict_probability)

​

Predicted Value: ['Yes'] with [[0.13043478 0.86956522]]

Approach 2: We should use OrdinalEncoder for features

# Get dataset with string features

training_set=list(zip(weather, temp))

print(training_set)

[('Sunny', 'Hot'), ('Sunny', 'Hot'), ('Overcast', 'Hot'), ('Rainy', 'Mild'), ('Rainy', 'Cool'), ('Rainy', 'Cool'), ('Overcast', 'Cool'), ('Sunny', 'Mild'), ('Sunny', 'Cool'), ('Rainy', 'Mild'), ('Sunny', 'Mild'), ('Overcast', 'Mild'), ('Overcast', 'Hot'), ('Rainy', 'Mild')]

​

# Create Ordinal Encoder

enc = preprocessing.OrdinalEncoder()

encoded_training_set = enc.fit_transform(training_set)

print(encoded_training_set)

​

[[2. 1.]

[2. 1.]

[0. 1.]

[1. 2.]

[1. 0.]

[1. 0.]

[0. 0.]

[2. 2.]

[2. 0.]

[1. 2.]

[2. 2.]

[0. 2.]

[0. 1.]

[1. 2.]]

​

target_set = label

# Train model again

model2 = CategoricalNB()

model2.fit(encoded_training_set, target_set)

# Predict Output

predicted2= model2.predict([[0,2]]) # 0:Overcast, 2:Mild

predict_probability2 = model.predict_proba([[0,2]])

print("Predicted Value:", le.inverse_transform(predicted2), " with ", predict_probability2)

​

Predicted Value: ['Yes'] with [[0.13043478 0.86956522]]